Reinforcement Learning by Comparing Immediate Reward

نویسندگان

Punit Pandey

Deepshikha Pandey

Shishir Kumar

چکیده

This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate rewards using a variation of Q-Learning algorithm. Unlike the conventional Q-Learning, the proposed algorithm compares current reward with immediate reward of past move and work accordingly. Relative reward based Q-learning is an approach towards interactive learning. Q-Learning is a model free reinforcement learning method that used to learn the agents. It is observed that under normal circumstances algorithm take more episodes to reach optimal Q-value due to its normal reward or sometime negative reward. In this new form of algorithm agents select only those actions which have a higher immediate reward signal in comparison to previous one. The contribution of this article is the presentation of new Q-Learning Algorithm in order to maximize the performance of algorithm and reduce the number of episode required to reach optimal Q-value. Effectiveness of proposed algorithm is simulated in a 20 x20 Grid world deterministic environment and the result for the two forms of Q-Learning Algorithms is given. Keywords-component; Reinforcement Learning, Q-Learning Method, Relative Reward, Relative Q-Learning Method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Eecient Exploration for Optimizing Immediate Reward

We consider the problem of learning an eeective behavior strategy from reward. Although much studied, the issue of how to use prior knowledge to scale optimal behavior learning up to real-world problems remains an important open issue. We investigate the inherent data-complexity of behavior learning when the goal is simply to optimize immediate reward. Although easier than reinforcement learnin...

متن کامل

Efficient exploration for optimizing immediate reward

We consider the problem of learning an effective behavior strategy from reward. Although much studied, the issue of how to use prior knowledge to scale optimal behavior learning up to real-world problems remains an important open issue. We investigate the inherent data-complexity of behavior-learning when the goal is simply to optimize immediate reward. Although easier than reinforcement learni...

متن کامل

Shimkin 4 Reinforcement Learning – Basic Algorithms

Our agent usually has only partial knowledge of its environment, and therefore will use some form of learning scheme, based on the observed signals. To start with, the agent needs to use some parametric model of the environment. We shall use the model of a stationary MDP, with given state space and actions space. However, the state transition matrix P = (p(s′|s, a)) and the immediate reward fun...

متن کامل

An Analysis of Feature Selection and Reward Function for Model-Based Reinforcement Learning

In this paper, we propose a series of correlation-based feature selection methods for dealing with high dimensionality in feature-rich environments for modelbased Reinforcement Learning (RL). Real world RL tasks usually involve highdimensional feature spaces where standard RL methods often perform badly. Our proposed approach adopts correlation among state features as a selection criterion. The...

متن کامل

Dynamic Obstacle Avoidance by Distributed Algorithm based on Reinforcement Learning (RESEARCH NOTE)

In this paper we focus on the application of reinforcement learning to obstacle avoidance in dynamic Environments in wireless sensor networks. A distributed algorithm based on reinforcement learning is developed for sensor networks to guide mobile robot through the dynamic obstacles. The sensor network models the danger of the area under coverage as obstacles, and has the property of adoption o...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1009.2566 شماره

صفحات -

تاریخ انتشار 2010

Reinforcement Learning by Comparing Immediate Reward

نویسندگان

چکیده

منابع مشابه

Eecient Exploration for Optimizing Immediate Reward

Efficient exploration for optimizing immediate reward

Shimkin 4 Reinforcement Learning – Basic Algorithms

An Analysis of Feature Selection and Reward Function for Model-Based Reinforcement Learning

Dynamic Obstacle Avoidance by Distributed Algorithm based on Reinforcement Learning (RESEARCH NOTE)

عنوان ژورنال:

اشتراک گذاری